A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record-Systems

نویسندگان

  • Mauricio Sadinle
  • Stephen E. Fienberg
  • Maurice Falk
چکیده

We present a probabilistic method for linking multiple datafiles. This task is not trivial in the absence of unique identifiers for the individuals recorded. This is a common scenario when linking census data to coverage measurement surveys for census coverage evaluation, and in general when multiple record–systems need to be integrated for posterior analysis. Our method generalizes the Fellegi–Sunter theory for linking records from two datafiles and its modern implementations. The goal of multiple record linkage is to classify the record K-tuples coming from K datafiles according to the different matching patterns. Our method incorporates the transitivity of agreement in the computation of the data used to model matching 1 ar X iv :1 20 5. 32 17 v2 [ st at .A P] 6 F eb 2 01 3 probabilities. We use a mixture model to fit matching probabilities via maximum likelihood using the EM algorithm. We present a method to decide the record K-tuples membership to the subsets of matching patterns and we prove its optimality. We apply our method to the integration of the three Colombian homicide record systems and perform a simulation study to explore the performance of the method under measurement error and different scenarios. The proposed method works well and opens new directions for future research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

G-LINK: A Probabilistic Record Linkage System

At Statistics Canada, matching data without unique identifiers is a common practice. The probabilistic record linkage method developed by Ivan Fellegi and Allan Sunter 1 is the primary method recommended by Statistics Canada for this type of matching. In recent decades, work began to generalize the Fellegi–Sunter algorithm in order to offer our community the opportunity to use this methodology ...

متن کامل

Data Cleaning Methods

Data Cleaning methods are used for finding duplicates within a file or across sets of files. This overview provides background on the Fellegi-Sunter model of record linkage. The Fellegi-Sunter model provides an optimal theoretical classification rule. Fellegi and Sunter introduced methods for automatically estimating optimal parameters without training data that we extend to many real world sit...

متن کامل

Approaches to Multiple Record Linkage

We review the theory and techniques of record linkage that date back to pioneering work by Fellegi and Sunter on matching records in two lists. When the task involves linking K > 2 lists, the most common approach consists of performing all ( K 2 ) possible pairs of lists using a Fellegi-Sunter-like approach and then somehow reconciling the discrepancies in an ad hoc fashion. We describe some im...

متن کامل

The State of Record Linkage and Current Research Problems

This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful ideas for estimating record linkage parameters and other ideas that still influence record linkage ...

متن کامل

BUREAU OF THE CENSUS STATISTICAL RESEARCH DIVISION Statistical Research Report Series No. RR2000/06 Frequency-Based Matching in Fellegi-Sunter Model of Record Linkage

This paper extends techniques for frequency-based matching (see e.g., Fellegi and Sunter 1969). The extended techniques allow table-building under weaker assumptions than those typically used in practice. Although CPU requirements can increase, human intervention can be reduced in some situations.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015